Sparse Rectifier Neural Networks

نویسندگان

  • Xavier Glorot
  • Yoshua Bengio
چکیده

Rectifying neurons are more biologically plausible than sigmoid neurons, which are more biologically plausible than hyperbolic tangent neurons (which work better for training multi-layer neural networks than sigmoid neurons). We show that networks of rectifying neurons yield generally better performance than sigmoid or tanh networks while creating highly sparse representations with true zeros, in spite of the hard non-linearity and nondifferentiablity at 0. Introduction. Despite their original connection, there is an important gap between the common artificial neural network models used in machine learning (such as those used in the recent surge of papers on deep learning, see (Bengio, 2009) for a review) and several neuroscience observations: • Various studies on brain energy expense suggest that neurons encode information in a sparse and distributed way (Attwell and Laughlin, 2001), estimating the percentage of neurons active at the same time from 1 to 4% (Lennie, 2003). • There are also important divergences regarding the non-linear activation functions assumed in learning algorithms and in computational neuroscience. For example, with 0 input, the sigmoid has an output of 12 , therefore, after initializing with small weights, all neurons fire at half their saturation frequency. This is biologically implausible and also hurts gradient-based optimization (LeCun et al., 1998; Bengio and Glorot, 2010). The hyperbolic tangent has an output of 0 at 0, and is therefore preferred from the optimization standpoint (LeCun et al., 1998; Bengio and Glorot, 2010), but it forces a symmetry around 0 that is not present in biological neurons. Neuroscience models of neurons spiking rate in function of their input current are one-sided, have a strong saturation near 0 for their threshold current, and a slow saturation to the maximum firing rate at important currents. In addition, the neuroscience literature (Bush and Sejnowski, 1995; Douglas and al., 2003) indicates that cortical neurons are rarely in their saturation regime and can be approximated as rectifiers. We propose to explore the use of rectifying non-linearities as alternatives to the sigmoidal (or hyperbolic tangent) ones, in deep artificial neural networks, using an L1 sparsity regularizer to prevent potential numerical problems with unbounded activation. From the computational point of view, sparse representations have advantageous mathematical properties, like information disentangling (different explanatory factors do not have to be compactly entangled in a dense representation) and efficient variable-size representation (the number of non-zeros may vary for different inputs). Sparse representations are also more likely to be linearly separable (or more easily separable with less non-linear machinery). Learned sparse representations have been the subject of much previous work (Olshausen and Field, 1997; Doi, Balcan and Lewicki, 2006; Ranzato et al., 2007; Ranzato and LeCun, 2007; Ranzato, Boureau and LeCun, 2008; Mairal et al., 2009), and this work is particularly inspired by the sparse representations learned in the context of auto-encoders variants, since auto-encoders have been found to be very useful to train deep architectures (Bengio, 2009). In our experiments, we explore denoising auto-encoders (Vincent et al., 2008) for unsupervised pre-training, but using rectifying non-linearities in the hidden layers. Note that for an equal number of neurons, sparsity may hurt performance because it reduces the effective capacity of the model. The rectifier function max(0, x) is one-sided and therefore does not enforce a sign symmetry (like does the absolute value non-linearity |x| used in (Jarrett et al., 2009)) or antisymmetry (like does a tanh(x) non-linearity). Nevertheless, we can still obtain symmetry or antisymmetry by combining two rectifier units. The rectifier activation function has the benefit of being linear by parts, so the computation of activations is computationally cheaper, and the propagation of gradients is easier on the active paths (there is no gradient vanishing

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Sparse Rectifier Neural Networks

Rectifying neurons are more biologically plausible than logistic sigmoid neurons, which are themselves more biologically plausible than hyperbolic tangent neurons. However, the latter work better for training multi-layer neural networks than logistic sigmoid neurons. This paper shows that networks of rectifying neurons yield equal or better performance than hyperbolic tangent networks in spite ...

متن کامل

Speeding up Convolutional Neural Networks By Exploiting the Sparsity of Rectifier Units

Rectifier neuron units (ReLUs) have been widely used in deep convolutional networks. An ReLU converts negative values to zeros, and does not change positive values, which leads to a high sparsity of neurons. In this work, we first examine the sparsity of the outputs of ReLUs in some popular deep convolutional architectures. And then we use the sparsity property of ReLUs to accelerate the calcul...

متن کامل

Document Classification with Deep Rectifier Neural Networks and Probabilistic Sampling

Deep learning is regarded by some as one of the most important technological breakthroughs of this decade. In recent years it has been shown that using rectified neurons, one can match or surpass the performance achieved using hyperbolic tangent or sigmoid neurons, especially in deep networks. With rectified neurons we can readily create sparse representations, which seems especially suitable f...

متن کامل

Net-Trim: Convex Pruning of Deep Neural Networks with Performance Guarantee

Model reduction is a highly desirable process for deep neural networks. While large networks are theoretically capable of learning arbitrarily complex models, overfitting and model redundancy negatively affects the prediction accuracy and model variance. NetTrim is a layer-wise convex framework to prune (sparsify) deep neural networks. The method is applicable to neural networks operating with ...

متن کامل

Net-Trim: A Layer-wise Convex Pruning of Deep Neural Networks

and quantum settings Model reduction is a highly desirable process for deep neural networks. While large networks are theoretically capable of learning arbitrarily complex models, overfitting and model redundancy negatively affects the prediction accuracy and model variance. Net-Trim is a layer-wise convex framework to prune (sparsify) deep neural networks. The method is applicable to neural ne...

متن کامل

Deep Recurrent Neural Networks for Sequential (1).pages

In analyzing of modern biological data, we are often dealing with ill-posed problems and missing data, mostly due to high dimensionality and multicollinearity of the dataset. In this paper, we have proposed a system based on matrix factorization (MF) and deep recurrent neural networks (DRNNs) for genotype imputation and phenotype sequences prediction. In order to model the long-term dependencie...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010